Load libraries and aggregated data
#Double check generated data is consistent with data dictionary
aggregate_data %>% filter(sheep == "S18") %>%
filter(WaveFront == "SR",Catheter_Type == "Penta", Point_Number == 4815) %>%
select(signal) %>%
unlist() %>%
plot(type = "l")
Coincidently, the raw data looks like
aggregate_data %>% filter(sheep == "S18") %>%
filter(WaveFront == "SR",Catheter_Type == "Penta", Point_Number == 4815) %>%
select(rawsignal) %>%
unlist() %>%
plot(type = "l")
High level split of count of observations with and without histology info. Note - The histology data is sometimes blank and a signal is sometimes not present. Generally the number of observations that have histology labels (0 or 1) are outnumber by those that dont, particularly with S20.
aggregate_data %>% group_by(sheep,Catheter_Type,WaveFront) %>%
summarise(histology_count = sum(!is.na(endocardium_scar)),
no_histology_count = sum(is.na(endocardium_scar))) %>%
arrange(sheep,desc(histology_count))
## `summarise()` has grouped output by 'sheep', 'Catheter_Type'. You can override
## using the `.groups` argument.
## # A tibble: 18 × 5
## # Groups: sheep, Catheter_Type [6]
## sheep Catheter_Type WaveFront histology_count no_histology_count
## <fct> <fct> <fct> <int> <int>
## 1 S12 Penta SR 709 574
## 2 S12 Penta RVp 666 511
## 3 S12 Penta LVp 291 206
## 4 S15 Penta SR 792 1768
## 5 S15 Penta LVp 402 1648
## 6 S15 Penta RVp 390 1333
## 7 S17 Penta SR 456 501
## 8 S17 Penta LVp 391 652
## 9 S17 Penta RVp 391 769
## 10 S18 Penta SR 1329 1270
## 11 S18 Penta LVp 1143 1200
## 12 S18 Penta RVp 977 990
## 13 S20 Penta LVp 859 1589
## 14 S20 Penta SR 582 1008
## 15 S20 Penta RVp 478 1090
## 16 S9 Penta RVp 607 962
## 17 S9 Penta LVp 577 712
## 18 S9 Penta Ap 511 958
Looking at cleaned data only, some sheep (s20 SR wavelenth) have more imbalanced scar v no scar labels
cleaned_aggregate_data %>% filter(!is.null(signal)) %>%
filter(!is.na(endocardium_scar)) %>%
group_by(sheep,Catheter_Type,WaveFront, Categorical_Label) %>%
summarise(count = n())
## `summarise()` has grouped output by 'sheep', 'Catheter_Type', 'WaveFront'. You
## can override using the `.groups` argument.
## # A tibble: 36 × 5
## # Groups: sheep, Catheter_Type, WaveFront [18]
## sheep Catheter_Type WaveFront Categorical_Label count
## <fct> <fct> <fct> <chr> <int>
## 1 S12 Penta LVp NoScar 100
## 2 S12 Penta LVp Scar 191
## 3 S12 Penta RVp NoScar 209
## 4 S12 Penta RVp Scar 457
## 5 S12 Penta SR NoScar 188
## 6 S12 Penta SR Scar 521
## 7 S15 Penta LVp NoScar 158
## 8 S15 Penta LVp Scar 244
## 9 S15 Penta RVp NoScar 152
## 10 S15 Penta RVp Scar 238
## # ℹ 26 more rows
The available data reduces as not all data points have signal info along with not all points having histology info (blanks in cleaned_histology_all file).
We feed both the filtered (blanks in cleaned_histology_all and no signal) and imputed (blanks are treated as zeros in labells) to the orange data mining analysis.
The proportion of Scar and NoScar “roughly” balanced.
aggregate_data %>% filter(!is.null(signal)) %>% count(Categorical_Label)
## # A tibble: 2 × 2
## # Rowwise:
## Categorical_Label n
## <fct> <int>
## 1 Scar 16201
## 2 NoScar 13091
Notably, even though the S20 sheep is the control subject, it still has a significan portion of scar (2K) versus no-scar (3.6K). See below:
aggregate_data %>% filter(!is.null(signal)) %>% group_by(sheep) %>% count(Categorical_Label)
## # A tibble: 12 × 3
## # Groups: sheep [6]
## sheep Categorical_Label n
## <fct> <fct> <int>
## 1 S12 Scar 1969
## 2 S12 NoScar 988
## 3 S15 Scar 3365
## 4 S15 NoScar 2968
## 5 S17 Scar 1654
## 6 S17 NoScar 1506
## 7 S18 Scar 5040
## 8 S18 NoScar 1869
## 9 S20 Scar 1929
## 10 S20 NoScar 3677
## 11 S9 Scar 2244
## 12 S9 NoScar 2083
Same thing but by group
aggregate_data %>% filter(!is.null(signal)) %>% group_by(sheep,Catheter_Type,WaveFront) %>%
count(Categorical_Label)
## # A tibble: 36 × 5
## # Groups: sheep, Catheter_Type, WaveFront [18]
## sheep Catheter_Type WaveFront Categorical_Label n
## <fct> <fct> <fct> <fct> <int>
## 1 S12 Penta LVp Scar 320
## 2 S12 Penta LVp NoScar 177
## 3 S12 Penta RVp Scar 819
## 4 S12 Penta RVp NoScar 358
## 5 S12 Penta SR Scar 830
## 6 S12 Penta SR NoScar 453
## 7 S15 Penta LVp Scar 1079
## 8 S15 Penta LVp NoScar 971
## 9 S15 Penta RVp Scar 862
## 10 S15 Penta RVp NoScar 861
## # ℹ 26 more rows
Distribution of signal length assuming there is a signal. Reflects manual calibration of starting point for window of interest.
summary_window_length <- aggregate_data %>%
mutate(length_window = To - From) %>% filter(length_window != 0) %>%
group_by(WaveFront, Categorical_Label) %>% select(length_window)
## Adding missing grouping variables: `WaveFront`, `Categorical_Label`
summary_window_length %>%
plot_ly(x = ~length_window, type = "histogram") %>%
layout(barmode = "overlay",
xaxis = list(title = "Window of interest lenth"),
yaxis = list(title = "Frequency"),
title = "Histogram of Distinct Windows of Interest",
facet_row = ~WaveFront,
facet_col = ~Categorical_Label)
## Warning: 'layout' objects don't have these attributes: 'facet_row', 'facet_col'
## Valid attributes include:
## '_deprecated', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'modebar', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'barmode', 'bargap', 'mapType'
Shows distinct windows of interest by sheep.
aggregate_data %>% filter(!is.na(Categorical_Label)) %>%
mutate(length_window = To - From) %>% group_by(sheep) %>% filter(length_window != 0) %>%
summarise(mean_window = mean(length_window,na.rm = T))
## # A tibble: 6 × 2
## sheep mean_window
## <fct> <dbl>
## 1 S12 128.
## 2 S15 149.
## 3 S17 136.
## 4 S18 125.
## 5 S20 148.
## 6 S9 257
See save_plots.R for plots of signals by WaveFront and Sheep